Memory-guided Exploration in Reinforcement Learning

نویسندگان

James L. Carroll

Todd S. Peterson

Nancy E. Owens

چکیده

The life-long learning architecture attempts to create an adaptive agent through the incorporation of prior knowledge over the lifetime of a learning agent. Our paper focuses on task transfer in reinforcement learning and specifically in Q-learning. There are three main model free methods for performing task transfer in Qlearning: direct transfer, soft transfer and memoryguided exploration. In direct transfer Q-values from a previous task are used to initialize the Q-values of the next task. Soft transfer initializes the Q-values of the new task with a weighted average of the standard initialization value and the Q-values of the previous task. In memory-guided exploration the Q-values of previous tasks are used as a guide in the initial exploration of the agent. The weight that the agent gives to its past experience decreases over time. We explore stability issues related to the off-policy nature of memory-guided exploration and compare memory-guided exploration to soft transfer and direct transfer in three different envi-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Exploration in Reinforcement Learning Based on Utile Suffix Memory

Reinforcement learning addresses the question of how an autonomous agent can learn to choose optimal actions to achieve its goals. Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous approaches to exploration in reinforcement learning usually address exploration in the case when the environment is fully observable. In contrast, we study the case ...

متن کامل

Efficient Exploration in Reinforcement Learning Based on Short-term Memory

Reinforcement learning addresses the question of how an autonomous agent that senses and acts in its environment can learn to choose optimal actions to achieve its goals. It is related to the problem of learning control strategies. In practice multiple situations are usually indistinguishable from immediate perceptual input. These multiple situations may require different responses from the age...

متن کامل

Reducing state space exploration in reinforcement learning problems by rapid identification of initial solutions and progressive improvement of them

Most existing reinforcement learning methods require exhaustive state space exploration before converging towards a problem solution. Various generalization techniques have been used to reduce the need for exhaustive exploration, but for problems like maze route finding these techniques are not easily applicable. This paper presents an approach that makes it possible to reduce the need for stat...

متن کامل

Dual Memory Model for Using Pre-existing Knowledge in Reinforcement Learning Tasks

Reinforcement learning agents explore their environment in order to collect reward that allows them to learn what actions are good or bad in what situations. The exploration is performed using a policy that has to keep a balance between getting more information about the environment and exploiting what is already known about it. This paper presents a method for guiding exploration by pre-existi...

متن کامل